2/3/2021

Why learn linear algebra?

Applications include:

  • differential equations and dynamical systems
  • Markov chains
  • circuits
  • social network analyses
  • frequency analysis
  • Google’s Page Rank algorithm
  • machine learning

Goal of this tutorial: Explain linear models (regression) using linear algebra

Vectors

  • A vector is a list of numbers.
  • The dimension of a vector is the same as its length.

Example: if \(v = (42, \pi, 0, -e)\) then \(\dim(v) = 4\)

Example: \(v = (4, 3)\)

Basis

  • A basis is a coordinate system
  • In 2D we typically use the Euclidean basis with basis vectors \(\hat{i}=(1,0)\) and \(\hat{j}=(0,1)\)

Example: The vector \((4,3)\) in the Euclidean basis is written as such because \[ \begin{pmatrix} 4 \\ 3 \end{pmatrix} = 4\begin{pmatrix} 1 \\ 0 \end{pmatrix} + 3\begin{pmatrix} 0 \\ 1\end{pmatrix} \] Say we want to change basis to using \(v_1 = (1,2)\) and \(v_2 = (1.5, 0.5)\) instead. Observe, \[ \begin{pmatrix} 4 \\ 3 \end{pmatrix} = 1\begin{pmatrix} 1 \\ 2 \end{pmatrix} + 2\begin{pmatrix} 1.5 \\ 0.5 \end{pmatrix} \] so \((4,3)\) is actually written as \((1,2)\) in this new basis.

Matrices

Application: Linear Models

  • Consider an additive model with \(p\) predictors.
  • The model for the \(i\)th observation is

\[ y_i = \beta_0 + \beta_1x_{i1} + \beta_2x_{i2} + \cdots + \beta_nx_{ip} + \varepsilon_i \]

  • If we wrote out the model for all \(n\) observations it would be

\[ \begin{align} y_1 &= \beta_0 + \beta_1 x_{11} + \beta_2 x_{12} + \cdots + \beta_p x_{1p} + \varepsilon_1\\ y_2 &= \beta_0 + \beta_1 x_{21} + \beta_2 x_{22} + \cdots + \beta_p x_{2p} + \varepsilon_2\\ &\hspace{2mm}\vdots\\ y_n &= \beta_0 + \beta_1 x_{n1} + \beta_2 x_{n2} + \cdots + \beta_p x_{np} + \varepsilon_n \end{align} \]

Application: Linear Models

  • Let \(X\) be \(n \times (p+1)\) where \(X_{\cdot 1}=1\) and \((X)_{ij}\) is the \((j-1)\)th predictor, \(j = 2,\ldots,p+1\), for the \(i\)th observation
  • \(X\) is called the design matrix
  • The previous system of equations can be written compactly as \[ y = X\beta + \varepsilon \] where \(\dim(y) = \dim(\varepsilon) = n\) and \(\beta = (\beta_0, \beta_1, \ldots, \beta_p)\)

Application: Linear Models

Want to find the “best” estimates of \(\beta\)

  • Least squares

    • Find \(\hat{\mathbf{y}} = \mathbf{X}\hat{\beta}\) that minimizes \(\sum_{i = 1}^n (\hat{y_i} - y_i)^2\)

    • Take the derivative of sum of squares, set to zero, show this is a minimum (Gauss-Markov Theorem)

  • Or using linear algebra

Application: Linear Models

Simple example

\[ \mathbf{y} = \begin{pmatrix} 2 \\ 2 \\ 1 \end{pmatrix}, \mathbf{X} = \begin{pmatrix} 1 & 4 \\ 1 & 3 \\ 1 & 4 \end{pmatrix}, \mathbf{\beta} = \begin{pmatrix} \beta_0 \\ \beta_1 \end{pmatrix} \]

The equation \(\mathbf{y} = \mathbf{X}\beta\) doesn’t have a solution because it’s overdetermined - two unknowns and 3 equations

Application: Linear Models

\((\beta_0, \beta_1)\) is the coordinates of \(\mathbf{X}\beta\) in the column space of \(\mathbf{X}\):

\[ \begin{align} \mathbf{X}\beta &= \begin{pmatrix} 1 & 4 \\ 1 & 3 \\ 1 & 4 \end{pmatrix} \begin{pmatrix} \beta_0 \\ \beta_1 \end{pmatrix}\\ &= \beta_0 \begin{pmatrix} 1 \\ 1 \\ 1 \end{pmatrix} + \beta_1 \begin{pmatrix} 4 \\ 3 \\ 4 \end{pmatrix} \end{align} \]

If \(\mathbf{y}\) is not in the column space of \(\mathbf{X}\), then there is no \(\beta\) such that \(\mathbf{y} = \mathbf{X}\beta\)

We want to choose \(\hat{\beta}\) so that \(\mathbf{X}\hat{\beta}\) is as close as possible to \(\mathbf{y}\)

Stpasha, Public domain, via Wikimedia Commons

Application: Linear Models

Project \(\mathbf{y}\) onto column space of \(\mathbf{X}\)

\[\mathbf{X}\hat{\beta} = \mathbf{X}(\mathbf{X}^T \mathbf{X})^{-1} \mathbf{X}^T \mathbf{y} = \mathbf{P}\mathbf{y}\]

\(\mathbf{P}\) is the projection matrix

\(\hat{\beta} = (\mathbf{X}^T \mathbf{X})^{-1} \mathbf{X}^T \mathbf{y}\)

Application: Linear Models

x <- cbind(c(1, 1, 1), c(4, 3, 4))
p <- x %*% solve(t(x) %*% x) %*% t(x)
p
##      [,1] [,2] [,3]
## [1,]  0.5    0  0.5
## [2,]  0.0    1  0.0
## [3,]  0.5    0  0.5
y <- c(2, 2, 1)
p %*% y
##      [,1]
## [1,]  1.5
## [2,]  2.0
## [3,]  1.5

Application: Linear Models

\[ \mathbf{P}\mathbf{y} = \begin{pmatrix} 1.5 \\ 2 \\ 1.5 \end{pmatrix} = 3.5 \begin{pmatrix} 1 \\ 1 \\ 1 \end{pmatrix} - 0.5 \begin{pmatrix} 4 \\ 3 \\ 4 \end{pmatrix} \]

So

\(\beta_0 = 3.5, \beta_1 = - 0.5\)

Application: Linear Models

lm(y ~ x - 1)
## 
## Call:
## lm(formula = y ~ x - 1)
## 
## Coefficients:
##   x1    x2  
##  3.5  -0.5

Application: Linear Models

\[Y = \begin{pmatrix} y_{11} & y_{12} \\ \vdots & \vdots \\ y_{n1} & y_{n2} \end{pmatrix}\]

\[X = \begin{pmatrix} 1 & x_{11} \\ \vdots & \vdots \\ 1 & x_{n1} \end{pmatrix}\]

\[\beta = \begin{pmatrix} \beta_{01} & \beta_{02} \\ \beta_{11} & \beta_{12} \end{pmatrix}\]

Application: Linear Models

x <- cbind(rep(1, 10), 0.1*1:10)
y <- cbind(rnorm(10, x %*% c(4, 2)), rnorm(10, x %*% c(6, 1), 0.3))
betahat <- solve(t(x) %*% x) %*% t(x) %*% y
betahat
##          [,1]     [,2]
## [1,] 4.502601 5.711568
## [2,] 1.444655 1.346630
lm_res <- lm(y ~ x - 1)
lm_res$coefficients
##        [,1]     [,2]
## x1 4.502601 5.711568
## x2 1.444655 1.346630

The Eigenvalue Equation

  • If there is a vector \(v\) with an associated scalar \(\lambda\) such that \[Av = \lambda v\] we say \(v\) and \(\lambda\) are an eigenvector and eigenvalue of \(A\)
  • Together, \((\lambda, v)\) are called an eigenpair
  • “Eigen” is German for “own” or “inherent” so think of eigenvectors as special vectors associated with \(A\)

Eigenvectors of a 2x2

Application: Game Theory

  • Suppose Alice, Bob, and Charlie have some initial amount of money in their account: \(a_0\), \(b_0\) and \(c_0\)
  • The next day:
    • Alice’s account decreases by 2.5% but increases by 5% of Bob’s account balance and decreases by 5% of Charlies account balance.
    • Bob’s account decreases by 7.5% but increases by 3.75% of Alice’s account balance and decreases by 32.5% of Charlies account balance.
    • Charlie’s account decreases by 40%.
  • Who will end up with the most money? \[ \begin{pmatrix} a_{t+1} \\ b_{t+1} \\ c_{t+1} \end{pmatrix} = a_t\begin{pmatrix} 0.975 \\ 0.0375 \\ 0 \end{pmatrix} + b_t\begin{pmatrix} 0.05 \\ 0.925 \\ 0 \end{pmatrix} + c_t\begin{pmatrix} -0.05 \\ -0.325 \\ 0.6 \end{pmatrix} \]

Application: Game Theory

Application: Game Theory

  • The eigenpairs for the coefficient matrix were \[ \lambda_1 = 1, v_1 = \begin{pmatrix} 1 \\ 0.5 \\ 0 \end{pmatrix}, \quad \lambda_2 = 0.9, v_2 = \begin{pmatrix} -1 \\ 1.5 \\ 0 \end{pmatrix}, \quad \lambda_3 = 0.6, v_3 = \begin{pmatrix} 0 \\ 1 \\ 1 \end{pmatrix}, \]
  • Since \(\lambda_1\) dominates the eigenvalues the system will, in the long run, converge to the line spanned by \(v_1\).
  • So for many initial conditions we should expect Alice to have twice as much money as Bob and for Charlie to be broke.

Example: Multivariate Normal

From class

If \(X\) is random vector that has a multivariate Normal distribution, we say \[\begin{aligned} X = (X_1, \ldots, X_k) \sim \mathrm{MVN}(\mu, \Sigma) . \end{aligned}\]

The parameters are the mean vector \(\mu\) and covariance matrix \(\Sigma\): \[\begin{aligned} \mathrm{E}[X_i] = \mu_i \end{aligned}\] and \[\begin{aligned} \mathrm{cov}[X_i, X_j] = \Sigma_{i,j} . \end{aligned}\]

Example: a univariate linear model

Let’s say that \(X \sim \mathrm{Normal}(0, 1)\) and \[\begin{aligned} Y &= \beta X + \epsilon \\ \epsilon &\sim \mathrm{Normal}(0, \sigma) . \end{aligned}\]

. . .

Then \(Y\) also has a Normal distribution, and \[\begin{aligned} \mathrm{var}[X] &= 1, \\ \mathrm{var}[Y] &= \beta^2 + \sigma^2 \qquad \text{and} \\ \mathrm{cov}[X, Y] &= \beta, \end{aligned}\]

so \[\begin{aligned} (X, Y) \sim \mathrm{MVN}\left( \begin{bmatrix} 0 \\ 0 \end{bmatrix} \begin{bmatrix} 1 & \beta \\ \beta & \beta^2 + \sigma^2 \end{bmatrix} \right) . \end{aligned}\]

\(\beta = 0.7\)

\(\sigma = 1\)

Eigenvalues and Eigenvectors

covmat
##      [,1] [,2]
## [1,]  1.0  0.7
## [2,]  0.7  1.5
eigen(covmat)
## eigen() decomposition
## $values
## [1] 2.0 0.5
## 
## $vectors
##      [,1]  [,2]
## [1,] 0.58 -0.82
## [2,] 0.82  0.58

Eigenvalues and Eigenvectors

The Eigenbasis

  • Most matrices have linearly independent eigenvectors
  • Expanding coordinate in terms of the eigenvectors will turn \(A\) into a diagonal matrix, making arithmetic ezpz
  • So the eigenvector coordinates are in some sense the “best” coordinates for \(A\)
  • A rotation matrix won’t have a real eigenpair, so if your eigenvalues are complex it suggests some sort of rotation is happening
  • When \(A\) is symmetric, i.e. \(A_{ij} = A_{ji}\), it will have real eigenvalues and orthogonal eigenvectors

Additional Sources